智能论文笔记

Learning Context-Aware Representations of Subtrees

Cedric Cook

分类：机器学习

2021-11-08

本文通过自然应用程序对网页和元素分类来解决复杂结构数据的高效表示的问题。我们假设网页内部元素周围的上下文对问题的价值很高，目前正在被利用。本文旨在通过考虑到其上下文来解决将Web元素分类为DOM树的子树的问题。为实现这一目标，首先讨论当前在结构上工作的专家知识系统，如树 - LSTM。然后，我们向该模型提出上下文感知扩展。我们表明，在多级Web分类任务中，新模型实现了0.7973的平均F1分数。该模型为各种子树生成更好的表示，并且可以用于应用此类元素分类，钢筋在网上学习中的状态估计等。

translated by 谷歌翻译

Current State of Community-Driven Radiological AI Deployment in Medical Imaging

Vikash Gupta , Barbaros Selnur Erdal , Carolina Ramirez , Ralf Floca , Laurence Jackson , Brad Genereaux , Sidney Bryson , Christopher P Bridge , Jens Kleesiek , Felix Nensa

分类：人工智能

2022-12-29

Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and developing tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions.

translated by 谷歌翻译

High-resolution canopy height map in the Landes forest (France) based on GEDI, Sentinel-1, and Sentinel-2 data with a deep learning approach

Martin Schwartz , Philippe Ciais , Catherine Ottlé , Aurelien De Truchis , Cedric Vega , Ibrahim Fayad , Martin Brandt , Rasmus Fensholt , Nicolas Baghdadi , François Morneau

分类：计算机视觉

2022-12-20

In intensively managed forests in Europe, where forests are divided into stands of small size and may show heterogeneity within stands, a high spatial resolution (10 - 20 meters) is arguably needed to capture the differences in canopy height. In this work, we developed a deep learning model based on multi-stream remote sensing measurements to create a high-resolution canopy height map over the "Landes de Gascogne" forest in France, a large maritime pine plantation of 13,000 km$^2$ with flat terrain and intensive management. This area is characterized by even-aged and mono-specific stands, of a typical length of a few hundred meters, harvested every 35 to 50 years. Our deep learning U-Net model uses multi-band images from Sentinel-1 and Sentinel-2 with composite time averages as input to predict tree height derived from GEDI waveforms. The evaluation is performed with external validation data from forest inventory plots and a stereo 3D reconstruction model based on Skysat imagery available at specific locations. We trained seven different U-net models based on a combination of Sentinel-1 and Sentinel-2 bands to evaluate the importance of each instrument in the dominant height retrieval. The model outputs allow us to generate a 10 m resolution canopy height map of the whole "Landes de Gascogne" forest area for 2020 with a mean absolute error of 2.02 m on the Test dataset. The best predictions were obtained using all available satellite layers from Sentinel-1 and Sentinel-2 but using only one satellite source also provided good predictions. For all validation datasets in coniferous forests, our model showed better metrics than previous canopy height models available in the same region.

translated by 谷歌翻译

Predicting Properties of Quantum Systems with Conditional Generative Models

Haoxiang Wang , Maurice Weber , Josh Izaac , Cedric Yen-Yu Lin

分类：机器学习

2022-11-30

Machine learning has emerged recently as a powerful tool for predicting properties of quantum many-body systems. For many ground states of gapped Hamiltonians, generative models can learn from measurements of a single quantum state to reconstruct the state accurately enough to predict local observables. Alternatively, kernel methods can predict local observables by learning from measurements on different but related states. In this work, we combine the benefits of both approaches and propose the use of conditional generative models to simultaneously represent a family of states, by learning shared structures of different quantum states from measurements. The trained model allows us to predict arbitrary local properties of ground states, even for states not present in the training data, and without necessitating further training for new observables. We numerically validate our approach (with simulations of up to 45 qubits) for two quantum many-body problems, 2D random Heisenberg models and Rydberg atom systems.

translated by 谷歌翻译

Reproducibility in medical image radiomic studies: contribution of dynamic histogram binning

Darryl E. Wright , Cole Cook , Jason Klug , Panagiotis Korfiatis , Timothy L. Kline

分类：计算机视觉

2022-11-09

The de facto standard of dynamic histogram binning for radiomic feature extraction leads to an elevated sensitivity to fluctuations in annotated regions. This may impact the majority of radiomic studies published recently and contribute to issues regarding poor reproducibility of radiomic-based machine learning that has led to significant efforts for data harmonization; however, we believe the issues highlighted here are comparatively neglected, but often remedied by choosing static binning. The field of radiomics has improved through the development of community standards and open-source libraries such as PyRadiomics. But differences in image acquisition, systematic differences between observers' annotations, and preprocessing steps still pose challenges. These can change the distribution of voxels altering extracted features and can be exacerbated with dynamic binning.

translated by 谷歌翻译

Can We Automate the Analysis of Online Child Sexual Exploitation Discourse?

Darren Cook , Miri Zilka , Heidi DeSandre , Susan Giles , Adrian Weller , Simon Maskell

分类：自然语言处理

2022-09-25

社交媒体的日益普及引起了人们对儿童在线安全的关注。未成年人与具有掠夺性意图的成年人之间的互动是一个特别严重的关注点。在线性修饰的研究通常依靠领域专家来手动注释对话，从而限制了规模和范围。在这项工作中，我们测试了良好的方法如何检测对话行为并取代专家的人类注释。在在线修饰的心理理论中，我们将$ 6772的$ 6772 $聊天消息标记为儿童性犯罪者以十一种掠夺性行为之一发送的聊天消息。我们训练字袋和自然语言推断模型来对每种行为进行分类，并表明，最佳性能模型以一致但不与人类注释的方式分类的方式对行为进行了分类。

translated by 谷歌翻译

WiForceSticker: Batteryless, Thin Sticker-like Flexible Force Sensor

Agrim Gupta , Daegue Park , Shayaun Bashar , Cedric Girerd , Tania Morimoto , Dinesh Bharadia

分类：机器人

2022-09-19

彼此接触的任何两个物体都会仅仅是由于重力或机械接触而引起的力，例如机器人手臂抓住一个物体，甚至是我们膝关节处的两个骨头之间的接触。自然测量和监视这些接触力的能力允许从仓库管理（基于重量检测错误包装）到机器人技术（使机器人臂的抓地力与人类皮肤一样敏感）和医疗保健（膝关节植入物）的大量应用。设计一个无处不在的力传感器是充满挑战的，该传感器可自然地用于所有这些应用。首先，传感器应足够小，以适合狭窄的空间。接下来，我们不想铺设笨重的电缆来读取传感器的力值。最后，我们需要进行无电池设计以满足体内应用程序。我们开发了WiforCesticker，这是一种无线，无电池，类似贴纸的力传感器，可以在任何表面上都可以无处不在，例如所有仓库包装，机器人手臂和膝关节。 WiforCesticker首先设计一个$ 4 $ 〜mm〜 $ \ $ \ times $〜$〜$ 2 $ 〜mm〜 $ \ $ \ times $〜$〜$〜$ 0.4 $〜毫米电容传感器设计，配备了$ 10 $〜$〜$〜$〜$〜$〜$〜$ 〜mm〜mm 〜mm 〜mm 〜mm在灵活的PCB基材上设计。其次，它引入了一种新的机制，可以通过将传感器与COTS RFID系统插入传感器，从而无线读取器无线读取器可以通过无线读取器读取力信息。该传感器可以在多个测试环境中检测到$ 0 $ -6 $ 〜n的力量，感应精度为$ <0.5 $ 〜n，并在传感器上使用超过10,000美元的$ 10,000 $变化的力级按下。我们还通过设计传感器展示了两个应用程序案例研究，称量仓库包和骨接头施加的传感力。

translated by 谷歌翻译

Private Synthetic Data for Multitask Learning and Marginal Queries

Giuseppe Vietri , Cedric Archambeau , Sergul Aydore , William Brown , Michael Kearns , Aaron Roth , Ankit Siva , Shuai Tang , Zhiwei Steven Wu

分类：机器学习

2022-09-15

我们提供了一种差异化私有算法，用于同时生成多个任务的合成数据：边际查询和多任务机器学习（ML）。我们算法中的一个关键创新是能够直接处理数值特征的能力，与许多相关的先验方法相反，这些方法需要首先通过{binning策略}将数值特征转换为{高基数}分类特征。为了提高准确性，需要较高的分子粒度，但这会对可伸缩性产生负面影响。消除对套在一起的需求使我们能够产生合成数据，以保留大量统计查询，例如数值特征的边际和条件线性阈值查询。保留后者意味着在特定半空间上方的每个类标记的点的比例在实际数据和合成数据中都大致相同。这是在多任务设置中训练线性分类器所需的属性。我们的算法还使我们能够为混合边缘查询提供高质量的合成数据，这些数据结合了分类和数值特征。我们的方法始终比最佳可比技术快2-5倍，并在边缘查询和混合型数据集的线性预测任务方面提供了显着的准确性改进。

translated by 谷歌翻译

TSInterpret: A unified framework for time series interpretability

Jacqueline Höllig , Cedric Kulbach , Steffen Thoma

分类：机器学习

2022-08-10

随着深度学习算法在时间序列分类中的应用越来越多，尤其是在高风化场景中，解释这些算法的相关性成为关键。尽管时间序列的可解释性研究已经增长，但从业者的可访问性仍然是一个障碍。没有统一的API或框架，使用的可解释性方法及其可视化的使用方式多样。为了缩小这一差距，我们介绍了TSInterpret易于扩展的开源Python库，用于解释将现有解释方法结合到一个统一框架中的时间序列分类器的预测。库功能（i）最先进的可解释性算法，（ii）公开了统一的API，使用户能够始终如一地使用解释，并为每种说明提供合适的可视化。

translated by 谷歌翻译

Uncertainty Calibration in Bayesian Neural Networks via Distance-Aware Priors

Gianluca Detommaso , Alberto Gasparin , Andrew Wilson , Cedric Archambeau

分类： (统计)机器学习 | 人工智能 | 机器学习

2022-07-17

随着我们远离数据，预测不确定性应该增加，因为各种各样的解释与鲜为人知的信息一致。我们引入了远距离感知的先验（DAP）校准，这是一种纠正训练域之外贝叶斯深度学习模型过度自信的方法。我们将DAPS定义为模型参数的先验分布，该模型参数取决于输入，通过其与训练集的距离度量。DAP校准对后推理方法不可知，可以作为后处理步骤进行。我们证明了其在各种分类和回归问题中对几个基线的有效性，包括旨在测试远离数据的预测分布质量的基准。

translated by 谷歌翻译